vskfandomcom-20200214-history
Forum talk:Semantic data for this wiki
Semantic design feedback Najevi, I don't know anything about this particular subject matter other than how to rig a whisker pole which I expect is not too relevant to virtual skippering. Is there an example article where you have modeled how the information from races is made visible to the visitor? *Without stepping through some real examples of your use of SMW, it is hard to evaluate the soundness of your design. *To your specific questions: **Data namespace: You can do pseudo namespaces by simply naming a page Data:Foo Race2009-07-11-3482. Making it a real separate namespace has a few advantages like for autocompletion you can specify constraining to a namespace, but you can do this using categories too. What is the motivation asking for a separate namespace? **Your import plan- That's one way to go- I haven't looked closely at it because it is a mismatch for our needs. ***Have you experimented hand creating an XML file for importing the data? Does Import accept them satisfactorily? ***As I mentioned, we too are importing data, but it is highly noisy, possibly requiring hand edits on each element. For this reason, we are planning on using an application like wikipedia:AutoWikiBrowser with a custom plug in for import so that contributors can hand verify and correct imported data, page by page. This advantage of manual intervention appears to be irrelevant for what you are doing, so Import seems like a reasonable way to go. Presumably your data is totally uniform being created as output from another program, so the programing to would probably be fairly straightforward. You probably will want to choose a language that everyone can use so that the importer is supportable by others in the future. I don't have any recommendations on that score- Language choice should probably be chosen based on what most people that would contribute your site would most likely be familiar with. -User:Phlox 16:31, 13 July 2009 (UTC) Thank you for this follow up. I'd been active at smwtest.wikia.com these past couple of weeks so hadn't noticed your message until today. # I have not yet done a mock up of presenting race data. It isn't the kind of data that one browses but rather you might query/filter it and view summary tables or reports. (see also: Forum:Semantic_data_for_this_wiki#Race) Although I don't yet have experience with semantic Concepts, I imagine using that to provide predefined queries for users to add to their personal list of favorites (along with favorite boats, skins, tracks, etc.) Users can add whichever canned queries they like to their list of favorites and enjoy that data displayed at their user page. (See MediaWiki:Welcome-user-page/pushed messages for the mechanism I plan to use for introducing users to this idea of canned queries displayed at your user page. It's only showing an empty list of favorites at this stage.! # Based on what I understand of SMW so far I have no specific motivation to request a separate name space. #*The one possibility I might imagine is the ability to exclude hits on those race result pages from search results. (Race result pages will include user login names and nicknames for their boat. Sometimes these will include keywords that a visitor might search for and it would be meaningless to direct such a visitor to a race results page. This may also prove the case for "race track" pages. |autocomplete on category= and |values from category= have already proven sufficient for all needs I have been able to imagine while testing at smwtest.wikia.com #* I am thinking I'll use the Project: name space for race result pages initially. # The race results raw data is created by the game client so it is regular, just as you anticipated: see Forum:Semantic_data_for_this_wiki#Race_results for a sample. At a previous wiki I tested the creation of import-compatible XML using a rather clumsy awk script and yes I recall having to do lots of manual touch up. I had less regular data to work with then so I expect "smoother sailing" with the highly regular race results data. I have not used AWB before but have noticed it mentioned in IRC #wikia often so will be sure to study that option before proceeding. ; Fan list -come- Favorites list : One especially challenging problem I ran into was providing a simple two radio button form for a user to add or remove their name to/from a comma-delimited list of usernames without exposing other usernames on that list to accidental deletion (or addition) by a clumsy user. :* After 2 weeks learning about semantic forms I've realized that SF is not suited to this purpose. Once I realized that SF only works with the arguments to named parameters in a form's template call I began to appreciate it's limitations. (Otherwise, very useful of course.) I could use client-side javascript to manage the selection/deselection of values in a list of possible usernames and stick with an SF based solution. :* I have yet to study a non-SF form design to achieve the same purpose. I imagine working directly with the semantic data on a page and the string parser function equivalents of #ArrayUnion: and #ArrayDiff: parser functions. (String #Replace: seems to do some of what I need.) najevi 00:20, 21 July 2009 (UTC) The ideal race results file Sounds like it's a plan. I've been playing around with the whole idea of gathering data since vsk 1.... so i might have a bit of a head start. 1st. It shouldn't become a data mining project, unless you have resources (hardware, space, time) to spare. 2nd. KISS Keep it Stupid Simple. 3rd. Data especially results must be trustworthy. The resultfiles VSK5 are trustworthy because: :a) use the unique vsk-id (vsk account, login) :b) results of finishers are synched but for the last participant at each user (there have been incidental deviations which i could'nt track but basically i believe this to be the case) We can identify objects that relate to a result in a race: track, boat, participant *A participant will have an identifier this must be the vsk-id (alias can be changed) *a boat has an identifier, md5 checksum. *a track has an identifier, md5 checksum. (i think) Basically all participants would have the same result but for the last participant. The only mean i can identify for verifying a result file is by checking against all opponents. This can only partly work as only participants that are present (connected) can receive and synchronise data. All events before a disconnect must/should be synchronised. AFAIK this basically represents the situation at present (VSK5 08-2009) A result can be majorly influenced by the participants behaviour.. weather controlled through race modes auto trim vs manual trim, tactical vs. arcade are set, or because of a voluntary agreed upon set of rules. I've come to the conclusion that it should be left to the participants whether they feel a race result should be categorised. Participants want * results contained in a self determined category * control over which results go into a category This is comparable to the ISAF grading of events, including qualifying standards. this also implies that results may need to be altered in case of protests, appeals etc. Only after results become trustworthy any analysis of the results becomes trustworthy. Any related data like weather parameters, tide parameters have no more meaning then the meaning of the results, and all of those have no more meaning then the stability and speed of the network in which the race was conducted. Statistically some parameters may show significance: *Network speed *Host capabillity *Windstrength *Boatmodel *Track *Race Category *Weather *Tide Data on tide and geografical conditions is found in the tidal maps and wind maps I've no idea how they can be taken in account. But fortunately these are properties of a track so it suffices to identify the track. Any track property can be referenced later.. Some paramaters of tracks are set by the host on the fly in each race, those parameter should be part of the result file. (Boat, Wind, Track etc.) The network is also conditional to the race and should be included in the resultfile (ping?, bandwith?). If the host id is recorded as such in the resultfile, the host capabillity can be checked by benchmarking for reference later. -- 14:25, 6 August 2009 (UTC) --80.153.18.114 thas was --Admiral 1 14:26, 6 August 2009 (UTC) ---- This is great feedback Admiral. Network stability/latency being an important race condition that I'd not acknowledged. :Funny thing is I was posting at your forum as you were posting here! I'll not repeat that prioritized list here since it resembles the main article anyway. :By the way look over the wiki markup that i applied to your message above. A few simple characters to learn for enumerated lists and indented lists and you are good to go! The file timestamp consistency * Is it not sufficient to treat the Host's Result.CSV file as the definitive race results? I'd like to better understand the comments about the last participant. I assume that refers to the last boat to finish a race, correct? So if I understand correctly the date-time in the Result_YY_MM_DD_HH_MM_SS.csv filename is identical for all skippers except the last skipper to finish the race. Is this the case even when one user remains in spectator mode from pre-start to post-finish? Track properties I agree that so long as a Track is uniquely identified by something like filename and MD5 checksum then data about that track which even a Nadeo design engineer would find difficult to algorithmically extract could be be manually extracted after the fact and only for those Tracks that have proven to be popular or have seen frequent use. The data mining concept The beauty of semantic annotation of data is that it is perfect for exactly this data mining purpose. And the really pleasant surprise is that it does this while still keeping it simple sir! As I see it the key enabler is to make whatever data you might want to mine "naturally occurring". Only Nadeo can help in that regard. I agree that it would be a fools errand to try to reverse engineer a .Gbx file format and figure out how to extract various pieces of data when we know full-well that the right Nadeo engineer could take a list of data requirements for not a CSV block within that file but more like and XML style block within that file, to write name/value pairs in a parser friendly format. That activity might be a few days work for one engineer familiar with the internal data structures and so it is a much smarter path to pursue. Therefore I think it comes down to persuading enough members of the VSK community to think broadly rather than narrowly and persuade Nadeo to come to the party and give us a "Next-generation Results" file that makes this data mining activity as easy as falling off a log. (I trust you read and understood the description of how I imagine a directory full of result files to be parsed by a race host and let the tool discard those races the host did not host and prepare for batch upload a single file. *It is a two stage process but it is able to be automated and need only be run once a week or even twice a month depending on how urgent an individual host feels it is to have that raw data uploaded for the community to "data mine". * Hosts who feel a strong sense of duty/obligation to an Event, a Club or to the VSK community as a whole may upload more regularly and so it is important that the process not be an onerous chore for these diligent hosts. Serious and casual skippers Some skippers are purely casual racers while some are more serious or better disciplined racers. I want to be able to cater to both ends of the spectrum a this wiki. So whereas a serious group of racers care about uniformity and consistency of results form one race to the next. The casual skipper is actually going to take great delight in mining for those "lucky days" when they managed to excel in a fleet they did not expect to perform well against. I call this "catching themselves doing something right". It is great for morale and maintaining their interest in the game. With semantic queries otherwise unconventional (or even obscure) categories can be created after the fact rather than anticipating the need and categorizing ahead of time. I cannot stress enough that this is the beauty of semantic data annotation. One controversial application of this form data mining is drawing attention to skippers who make a habit of DNF-ing. However after reading about the "Speed" rating at RandR site I can imagine how such a measure earned in disciplined racing conditions might be applied to "filter" normally well-behaved skippers from such a list. With your cooperation/guidance I hope to emulate that "Speed" rating at the wiki. * I guess the key point I want to make here is that I feel a duty to catering to the entire spectrum of skippers. * I think of it as a Live and let Live philosophy. There maybe an arcade-dodgem-car-style of game play that makes the blood of some class of "ship-shape and Bristol-fashion skippers" boil but which is seen by some different class of "kamikaze skippers" as the purest form of fun. Who really knows what naturally occurring data needs to be saved in order for that "kamikaze-class" to extract relevant Top-Ten lists of skippers who epitomize the ultimate style of game play for their unique interpretation of fun. Currently the two classes mix like oil and vinegar. I haven't begun to fathom what other factions may exist within the team racing genre or match racing genre. *Over time this will allow like-minded skippers to seek out other like-minded skippers and I think that will allow all players of VSK to enjoy the game much better than they can today. *Not to mention that a Nadeo representative studying this data may realize an emerging "faction" of players that might justify some special development direction for a future release. ; Speaking of match racing : It just occurred to me that pre-start duration might be something worth data-mining. Not the number of restarts just the duration of the restart. Network conditions I need to sleep on this subject. Ping and frame-rate are two obvious raw metrics but the challenge is arriving at some reasonable balance between burdening a client's CPU and bandwidth with data measuring, extraction and reporting versus rendering the graphics and computing the dead-reckoning of competitor boats. Some questions What is meant by * Host capabillity ? * Race Category ? Please keep the discussion flowing. -- najevi 16:55, 6 August 2009 (UTC) Which results are in sync and which are not? * Is it not sufficient to treat the Host's Result.CSV file as the definitive race results? I'd like to better understand the comments about the last participant. I assume that refers to the last boat to finish a race, correct? So if I understand correctly the date-time in the Result_YY_MM_DD_HH_MM_SS.csv filename is identical for all skippers except the last skipper to finish the race. Is this the case even when one user remains in spectator mode from pre-start to post-finish? :My researches so far lead to the conclusion that finishtimes where user are connected to the race are synchronised. :In an 8 Boat race where user A finsihes first and leaves the race before nr 8 has finished, will have a results for the :first 7 boats where the last boat (th 7th) may not have the same finishing time as with those who stayed until the last boat (8th) finished. :Also when the last boat finishes results are done and saved, however the last boat finishes on different finishtimes. :the time differences are in a range that can be explained by client server latencies. < 1 second. So the result file name "Result_YY_MM_DD_HH_MM_SS.csv" may or may not be identical, the TIMESTAMP of the file (created) may or may not be the same, and the finishtime for the last boat in each respective resultfile may or may not be the same. -- Admiral 1 Thanks for all of that - you have saved me weeks of research! I can understand why a consistent date-time-stamp in the file name seen by all participants is very important when race result data may be uploaded by '''any' participant or spectator''. After reading about the wrapper shell idea (further down this page) I can appreciate why this synchronization is important to that data usage model. * The low-tech alternative solution that I keep orbiting back to is to simply rely on the host's copy of the result file for the complete set of results. It is guaranteed to have a record of all finishers as well as all early departures. Having reiterated that point, I am happy to go along with this idea of properly synchronized file names because it cannot break any plans I have for those files at this wiki. --najevi 10:30, 7 August 2009 (UTC) Automating data collection * It is a two stage process but it is able to be automated and need only be run once a week or even twice a month depending on how urgent an individual host feels it is to have that raw data uploaded for the community to "data mine". * Hosts who feel a strong sense of duty/obligation to an Event, a Club or to the VSK community as a whole may upload more regularly and so it is important that the process not be an onerous chore for these diligent hosts. By wrapping a batch/shell script around the vsk.exe, all resultfiles may be uploaded with appendable parameters, parsing of all resultfiles can be done through a cronjob at the server. All a user would need to do is install the batch/shell script, and give parameters for any particular session, or if possible on a per resultfile basis. -- Admiral 1 This sounds like a really good plan. For my needs I would have that script generate output only at the host's PC. (I understand that your need is different.) That difference aside, I much prefer the idea of the data being immediately transposed into the final format ready for upload. I can imagine how this output might be appended to file that is then uploaded according to some regular schedule, as suggested. This model would also deal very nicely with checking for the reuse of an identical track. Just continuing with that track data thought for a little longer... Since the information is so compact it may even make sense for each user to have a local file containing the Track name and MD5sum for every race ever hosted (or raced) by that user. With such a file this script has a historical reference efficiently detect when a previously used Track has just been reused in the most recent race. -- najevi 10:58, 7 August 2009 (UTC) Follow up on that: every user allready has the Track otherwise automatically saved replays would not be posible.... hence all needed is extract some data allready collected and transfered (i n the replay) in a usable form from the replay, or alternatively generated at the host (laoding the map), and produce this in a unique track file. The track files' md5/filename appended to the resultfile with the race dependant parameters should do the rest. --Admiral 1 11:12, 7 August 2009 (UTC) still learning Categorizing race results * I guess the key point I want to make here is that I feel a duty to catering to the entire spectrum of skippers. * I think of it as a Live and let Live philosophy.'' There maybe an arcade-dodgem-car-style of game play that makes the blood of some class of "ship-shape and Bristol-fashion skippers" boil but which is seen by some different class of "kamikaze skippers" as the purest form of fun.......... ....What is meant by * Host capabillity ? * Race Category ? One of the parameters a user IMHO should supply with a resultfile is the category, oil or water, arcade or expert, automatic pens or manual.... basically.. the participants in a race agree upon a set of rules that apply (and i don't mean RRS only ;) I describe resultfiles that fit an agreed upon set of rules as a category. Users should be able to define their own category, wich leads to offering users to very simply organise an event in letting them state their own set of rules, publicise them and collect results for that event. That will not prevent the bad blood between the kamikazes and the arcade-dodgem-car-style and the bristol-fashio skippers, but it will let the kamikazes steer clear of the dodgem-car-stylers, and let the bristol folks retreat on their own island Speaking of match racing It just occurred to me that pre-start duration might be something worth data-mining. Not the number of restarts just the duration of the restart. I suggest to take a good hard look at vsk-match.com and point everybody interessted in matchracing that way... if there are any kamikazes, or dodgem-car-style matchracers count them... and then convince me thinking about some solution for them :) Host's race experience is different than a guest's experience * Host capabillity ? Even when in the VSK configuration there is talk of a peer 2 peer network.. the race is conducted over a server, cleint network. All data is collected by the host, and distributed by the host. Thus a host suffers less from network latencies, as it 'sees' first and client's (joiners) in average suffer twice the lag a host suffers. * human acts * Client A turns * Host sees client turn * Client B sees turn * human reacts * Client B turns * Host sees B turn * Client A sees B turn * humans are very lucky Lets say ping/pong lies around 250 ms average to the host that's 500 ms for A and B or at 20 Kts Boatspeed (36 km/h) about 5 meters for A and B each........................ So based on the law of nature hosts experience and see different from clients, if that is an advantage and by how much is an unknown.. In a matchrace (Only source for half way reliable data is vsk-match.com) the 2 participants though server and client exchange data on a peer (equals) basis during races.... --Admiral 1 22:35, 6 August 2009 (UTC) For the benefit of others following this discusion The fact that each userss resultfile may be different (name of the file, timestamp, finishtime of the last participant in each, possible not recorded particpants as the user did not witness them finishing ) ;Leads to 2 Problems: * there is no way to tell what resultfile reflects the truth. * verifying resultfiles against eachother is not possible. I would like resultfiles to be synchronised, so we have one version only, and everybody has the same content in the file (bit wise) So far i've worked on some code that stripps the spectators, verifies that participants aren't in any paralell races. Before processing results can be edited so manual protests end redresses can be jandled. Verification can only be done manually, eg. participants must communicate that they do not agree with posted results. --Admiral 1 Various formatting/sections added only to facilitate discussion by section edit. (Admiral - normally I would not edit someone else's contribution to a discussion. This is an exception to that I hope you understand/approve of.) --najevi 08:50, 7 August 2009 (UTC) did it again did it again, loose login through time out, no clue about styling :) and feel free to edit.. as you like :) as long as i don't have a clue -- 09:03, 7 August 2009 (UTC) Wiki markup may take a little time to get used to. I think that the discussion so far has helped identify some of the similarities as well as some differences in how the RandR system and this wiki plan to use various data from VSK. What I'd like to return to now is building upon the list of possible data that we might reasonably ask Nadeo to add to a file (or files) output from the game client. * So long as the merged list does not have any conflicting requirements then we are copacetic. The additions you've prompted me to want to add are: ;Event *Sailing instructions (''not a Nadeo deliverable but rather, a Host's deliverable) ;Race conditions * Game version ;Race results * for DNF/DNS skippers, the elapsed time (e.g. since last restart) until departure from race ;Track *''nothing additional so far'' ;Other requirements for Nadeo * Time-date in filename that is consistent for all participants. Does the date-time (UTC/Zulu) of the last restart make sense to use? I will be offline for the next couple of days. Cable modem died and needs to be replaced. -- najevi 01:32, 8 August 2009 (UTC) the list In my view The List should not contain items per se. It should be a standard that should define what data should be in a result, to make it usable for users. Such a definition would make resultfiles future proof, and without wanting to be heretic can be used for evaluation of any sailrace, from real life sailraces, to other sailing simulators. A definition should use unambiguous language so we should use standard sailing definitions where the apply * Track = Course * Boat = Boatclass (I suggest Boatclass instead of Class as a precaution against Murphy's as class may cause problems as it is a protected word in any programming language.) Any and all property of the race should always be contained in the data set, by reference or as item. Course Data *Area should contain the course area described by the geografical coordinates of the perimeter. *Tidalmap should contain 12 gridfiles (current vectors) from +6 to -6 hours of hightide for each quarter of the moon. (I suggest contacting producers of navigational software on standards) *Wind I should contain hourly gridfiles (wind vectors) on the whole hour encompassing the duration of the race. (I suggest contacting the world meteorological organisation http://www.wmo.int/ on standards) Race Data *Wind II should contain gridfiles with windvectors denoting deviation from general wind vector, for every general windvector (can be NULL if not applicable) at relevant time intervals during the race. (i think Alinghi and Oracle BMW may have some stuff on that, no clue weather they would talk about it though :) *Coordinates of any and all marks, and gates with navigational directives (round pass to sb, round pass to port) *Any and all parameters (all mentionend here so i'm not going to repeat), at least the properties that are parented by the object race. *Any and all parameters that define a participant, beside the id, finishtime, finishposition (disqualification), this should include the team color for team races. --Admiral 1 09:07, 8 August 2009 (UTC) Boat Data * any and all data that allow to **a) reproduce an exact physical copy of a boat fitting the boat class **b) reproduce exact behavior in any condition of that boat Note: I think vectors as a minimum are like (49°30'00.000"N, 123°30'00.000"W, 000°, 000,00 m/s) precision should be minimally such that reliable significance can be achieved through interpolation. Think of Wind I as the data you get before a race, think of wind II added to wind I as the detailed wind during the race aka available after the race. All this data should allow to reproduce an exact copy of any sail race Yes, this is what should make a replay, and it maybe an idea to reference the replay in the resultfile? --Admiral 1 08:35, 8 August 2009 (UTC) How To If Florent reads this he probably will fall of his chair..... I think however there are more factions out there that may see use in this. From sail makers, to boat builders, i expect however best changes are with research institutes. That said, it may suffice to allow for extraction of the data, eg. gridfiles are no more then an array of vectors. What is blatantly missing in the vsk resultfile now is the meta data, though some of that can be derived like the start-time (its the files' timestamp minus the finishtime of the last participant), and the host can be the user to record the resultfile. Boat, Course, Wind, and all can be supplied by the user that submits a resultfile. Since there may be more important matters to Nadeo it is vital to prioritise a list for Nadeo based on doabillity and necessity. Where doabillity means spending €'s and necessity could mean revenue...... Based on that i feel it is easiest and simplest to add all UI given parameters by the host of a race, as an array starting with the host id at the end of the resultfile. That Data should be already in an array of sorts, as that data is exactly what is send to a 'guest' when the race is joined. Maybe it's possible for us to just have a peak at the raw form of that array in ascii before any action is taken or even suggested? --Admiral 1 08:29, 8 August 2009 (UTC) I think we'll have the greatest chance of success if the list: # is prioritized # uses terminology that is consistent with that already used by Nadeo Even when those terms are not used by the wider sailing community. # restricts the scope to known data i.e. Does not go "fishing" for other speculated data that is not already visible to a user. # enumerates/specifies all pieces of data in totality This makes the list more immediately actionable. i.e. less time spent by Nadeo interpreting general goals. # can be implemented with only a minimal man-power investment (I'd suggest maybe one man-week or less.) --najevi 15:39, 10 August 2009 (UTC) ; Drafting the presentation to Nadeo team I think that when putting the prioritized list to Florent and team we should make it very clear that the Track related data being requested would ideally be written to a Next-Generation Track-name.Gbx file that provides an ASCII parser friendly header block that details pairs in whatever format is the safest and easiest for Nadeo to implement in short order. (We don't want such a change to introduce bugs to the game.) *I am not a big fan of having a separate file containing ASCII data about each track. The risk for mishandling that file or mistakenly associating it with the wrong Track-name.Gbx file seems a little too high. This a gut feeling for which I cannot offer any concrete explanation or example. A compromise solution would be to have that track data written to the latter half of the Result.CSV file which unfortunately breaks it's pure comma-separated-variable file format. This is not such a bad compromise because I am thinking that one way or another that pure CSV file format must be broken in order for the following vital data to be communicated: # login name of Host # game version # Name of Boat.ZIP file # MD5sum of Boat.ZIP file # Name of Track.Gbx file # MD5sum of Track.Gbx file The obvious downside to adding the 17 or 18 pieces of data describing a track is that it causes avoidable file size bloat for the Result.CSV file due to this repeated data that is not changing from one race on that track to the next race on that track. If the Nadeo team hate the idea of 17 or 18 pieces of Track.Gbx data being written to each an every Result.CSV file and would prefer to make an ASCII header for each Track-name.Gbx file then the question still remains: :What is the process for migrating the abundance of legacy format Track.Gbx to this next-gen file format. :* e.g. is it practical or even desirable for the game client to overwrite the old format file with the new format upon first use? There might be complications with that aspect of the problem that neither you nor I can anticipate. A very distant third preference is to have a Track_YYYY_MM_DD_HH_MM_SS.xml file generated for each race. ;Synchronizing time-stamp in file name: Quite separate from the Next-Gen file format request is the other important matter of making the Result_YYYY_MM_DD_HH_MM_SS.csv filenames match from one user to the next - regardless of whether the user left the race early or stayed until the last boat finished. --najevi 07:31, 14 August 2009 (UTC) ;disconnect result from replay, boat and track The resultfile data stems from one of the three files. Any data not known to track or boat is in the replay. The resultfile ultimately allows encompassing analysis of races. As csv format data is humanreadable. To allow users to digest data, parsers to process data, that data muts be humanreadable. I suggest implying functions that puts the data in respective files, boat.csv, track.csv, result.csv. The data in the respective csv is as trustworthy as its submitter. Hence a Challenge.Gbx can be submitted, together with its Challenge.csv If any users can verify the data in that Challenge.csv there is no issue with deliberate mishandling. To avoid mishandling by mistake, the Challenge.Gbx can be identified by name and md5 in the csv. The same goes for the Replay.Gbx and Boat.zip. Obviously Boat and Track should be identified in the Replay.Gbx and thus also in the human readable Result.csv I imagine that one function that extracts the data into a csv will work for all mentionend Gbx-es. --Admiral 1 17:19, 14 August 2009 (UTC) : There are two problems with the above line of thinking however, I am willing and able to go along with the idea of multiple .csv files being generated by the game client after each race (i.e. Result.csv, Boat.Csv, Track.csv, Replay.csv). For such a fragmented approach to work the file names used must all share the same YYYY_MM_DD_HH_MM_SS time stamp and that time stamp must be consistent across all clients connected to a race server. (Currently the Reply.Gbx files do not share any file name resemblance to the Result.csv file. A few of Replay file names resemble the file name used for a specific Track file.) : The first problem is that Florent has already explained to me that there is no way to parse the various .Gbx files without the entire game client. I quoted him somewhere but he used words to the effect that it is efficient/expedient for in-house developers but inefficient/unfriendly to any outside party. I don't think that wishing for such a function is going to be productive. : The second problem has to do with the Replay.Gbx file. Just as we can see varying degrees of partial Result.csv files for skippers who arrive after the last restart but then depart before the last boat finishes a race, so too can we expect varying degrees of partial Replay.Gbx file based on what time a skipper joined during the prestart period and then what time a skipper departs after they finish a race and before the last boat finishes a race. So it seems to me that the Replay.Gbx file seen by each skipper is going to have a different checksum. Even the file size in bytes can be expected to be different. : The 'First problem :' there is no way to parse the various .Gbx files without the entire game client. For that reason the data should be extracted in a humanradeable form, preferably by the game engine. Without knowing any detail a function to extract the user input at te time a boat is saved, a track is saved or a race is saved should be no problem. What's more if the function is triggered after saving the various files, the csv file can contain the md5 of the track, boat, race. : The 'Second Problem:' Result.csv files for skippers who arrive after the last restart but then depart before the last boat finishes a race, so too can we expect varying degrees of partial Replay.Gbx In my understanding the Result.csv IS the extracted data in human readable form from the Replay.Gbx. Unfortunately there is no way around the laws of nature, no game engine will be able to generate the exact same Replay simply based on the fact that data must be communicated, which speed will not exceed the speed of light. No player will ever see the same as another player, unless all players get presented data at the same time by introducing an artificial latency where data is held until all players computers have processed the data. If the hosts data set is determined to be tho only valid data set, then clients should be informed about how much disadvantage they except. (compare it with sailing in fog, or bad weather in real life). By adding the host id to the resultfile, this is possible. The clients resultfiles, can serve for verification of the hosts resultfile. (as a safeguard against manual manipulation)--Admiral 1 06:53, 17 August 2009 (UTC) :What I am doing with the wish list below is merging that into the list already detailed in the main forum article. :* To make the priority of each item clear I am pre-pending an (A), (B), © style priority letter to each piece of data described in that main article. We can easily edit those priorities without having to worry about reordering the list. :* Where information already exists in today's Result.csv file I am assigning the priority (0) - meaning that this is already a part of the baseline data set available in the Result.csv file. :* Where other information can be derived I am indenting that as a second level bullet in the list. In some cases I collapse such lists (look for a more/less link to expand/collapse those blocks of text) so as not to distract from the focus of the main list of "raw" data. :* I have tried to use the same words that you have suggested to describe each group of data. : Added a "raison d'etre" section to explain the driving motivation for the requested data. :--najevi 04:00, 15 August 2009 (UTC) Remains a priotorised list of data for each of these csv, possibly naming data that at present is not recorded at all. Data Wishlists Obvious as Boat and Track can be done manually. The top priority should lie with the Replay.Gbx - Result.csv. See for example the Farr40 page in the wiki here. And i remember having hourly isobar maps, tidal maps for a Track in vsk4 with weathersynopsis including shipping warnings. 12 tracks covering a 12 hour period in which a cyclonic weather system passes over the race area during one day. Boat.csv All boats are build with a boatparams.xml All properties that control a boat, from polars to how it surfs are in humanreadable form there. So the only thing Nadeo needs to do is publicise the boatparams for the boats that come with the game. Boat developers generally should get a massive kick in their butt for not willing to publicise the boatparams of their respective boats. Read about the boatparams in the Nadeo forum here BoatParam.xml --Admiral 1 06:59, 25 August 2009 (UTC) Track.csv Replay.csv I'ld split this in per participant data, and data of the race Participant #position (or array in form 1,1,1,2,3,4,5 for each mark) #time (or array in form timestamp,timestamp,etc. for each mark) #id #points #flag (DSQ, DNF, SPEC, NULL) #alias #team (red, blue, NULL) Race #Boat #md5 Boat #Track #md5 Track #host #windforce #rulemode #gamemode #etc #etc These are IMHO the minimum needed. I expect you to edit in this list as will I Alternatively data in the Gbx-es that was entered in human readable form sometime, probably still exists in the Gbx-es. Maybe it would be easiest to just have a function that exports that data in csv format. The function should be triggered about when the gbx-es are saved. --Admiral 1 17:19, 14 August 2009 (UTC)